{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  第3章 k近邻法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1．$k$近邻法是基本且简单的分类与回归方法。$k$近邻法的基本做法是：对给定的训练实例点和输入实例点，首先确定输入实例点的$k$个最近邻训练实例点，然后利用这$k$个训练实例点的类的多数来预测输入实例点的类。\n",
    "\n",
    "2．$k$近邻模型对应于基于训练数据集对特征空间的一个划分。$k$近邻法中，当训练集、距离度量、$k$值及分类决策规则确定后，其结果唯一确定。\n",
    "\n",
    "3．$k$近邻法三要素：距离度量、$k$值的选择和分类决策规则。常用的距离度量是欧氏距离及更一般的**pL**距离。$k$值小时，$k$近邻模型更复杂；$k$值大时，$k$近邻模型更简单。$k$值的选择反映了对近似误差与估计误差之间的权衡，通常由交叉验证选择最优的$k$。\n",
    "\n",
    "常用的分类决策规则是多数表决，对应于经验风险最小化。\n",
    "\n",
    "4．$k$近邻法的实现需要考虑如何快速搜索k个最近邻点。**kd**树是一种便于对k维空间中的数据进行快速检索的数据结构。kd树是二叉树，表示对$k$维空间的一个划分，其每个结点对应于$k$维空间划分中的一个超矩形区域。利用**kd**树可以省去对大部分数据点的搜索， 从而减少搜索的计算量。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 距离度量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设特征空间$x$是$n$维实数向量空间 ，$x_{i}, x_{j} \\in \\mathcal{X}$,$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(n)}\\right)^{\\mathrm{T}}$,$x_{j}=\\left(x_{j}^{(1)}, x_{j}^{(2)}, \\cdots, x_{j}^{(n)}\\right)^{\\mathrm{T}}$\n",
    "，则：$x_i$,$x_j$的$L_p$距离定义为:\n",
    "\n",
    "\n",
    "$L_{p}\\left(x_{i}, x_{j}\\right)=\\left(\\sum_{i=1}^{n}\\left|x_{i}^{(i)}-x_{j}^{(l)}\\right|^{p}\\right)^{\\frac{1}{p}}$\n",
    "\n",
    "- $p= 1$  曼哈顿距离\n",
    "- $p= 2$  欧氏距离\n",
    "- $p= \\infty$   切比雪夫距离"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "from itertools import combinations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def L(x, y, p=2):\n",
    "    # x1 = [1, 1], x2 = [5,1]\n",
    "    if len(x) == len(y) and len(x) > 1:\n",
    "        sum = 0\n",
    "        for i in range(len(x)):\n",
    "            sum += math.pow(abs(x[i] - y[i]), p)\n",
    "        return math.pow(sum, 1 / p)\n",
    "    else:\n",
    "        return 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 课本例3.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "x1 = [1, 1]\n",
    "x2 = [5, 1]\n",
    "x3 = [4, 4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4.0, '1-[5, 1]')\n",
      "(4.0, '1-[5, 1]')\n",
      "(3.7797631496846193, '1-[4, 4]')\n",
      "(3.5676213450081633, '1-[4, 4]')\n"
     ]
    }
   ],
   "source": [
    "# x1, x2\n",
    "for i in range(1, 5):\n",
    "    r = {'1-{}'.format(c): L(x1, c, p=i) for c in [x2, x3]}\n",
    "    print(min(zip(r.values(), r.keys())))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "python实现，遍历所有数据点，找出$n$个距离最近的点的分类情况，少数服从多数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "from sklearn.datasets import load_iris\n",
    "from sklearn.model_selection import train_test_split\n",
    "from collections import Counter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# data\n",
    "iris = load_iris()\n",
    "df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
    "df['label'] = iris.target\n",
    "df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n",
    "# data = np.array(df.iloc[:100, [0, 1, -1]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sepal length</th>\n",
       "      <th>sepal width</th>\n",
       "      <th>petal length</th>\n",
       "      <th>petal width</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.4</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4.6</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>4.4</td>\n",
       "      <td>2.9</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>4.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>5.4</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4.8</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>4.8</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>4.3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.1</td>\n",
       "      <td>0.1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>5.8</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>5.7</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>5.4</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>5.7</td>\n",
       "      <td>3.8</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.8</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>5.4</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>4.6</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>5.1</td>\n",
       "      <td>3.3</td>\n",
       "      <td>1.7</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>4.8</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.9</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>5.0</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>5.2</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>5.2</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>6.9</td>\n",
       "      <td>3.2</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>5.6</td>\n",
       "      <td>2.8</td>\n",
       "      <td>4.9</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>7.7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>6.7</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>6.3</td>\n",
       "      <td>2.7</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>6.7</td>\n",
       "      <td>3.3</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>125</th>\n",
       "      <td>7.2</td>\n",
       "      <td>3.2</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>126</th>\n",
       "      <td>6.2</td>\n",
       "      <td>2.8</td>\n",
       "      <td>4.8</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>6.1</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.9</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>6.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>7.2</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.8</td>\n",
       "      <td>1.6</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>7.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>6.1</td>\n",
       "      <td>1.9</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>7.9</td>\n",
       "      <td>3.8</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>6.4</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>6.3</td>\n",
       "      <td>2.8</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>134</th>\n",
       "      <td>6.1</td>\n",
       "      <td>2.6</td>\n",
       "      <td>5.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>7.7</td>\n",
       "      <td>3.0</td>\n",
       "      <td>6.1</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>6.3</td>\n",
       "      <td>3.4</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>6.4</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.5</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4.8</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>6.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.4</td>\n",
       "      <td>2.1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>6.7</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.6</td>\n",
       "      <td>2.4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>6.9</td>\n",
       "      <td>3.1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>5.8</td>\n",
       "      <td>2.7</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.9</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>6.8</td>\n",
       "      <td>3.2</td>\n",
       "      <td>5.9</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>6.7</td>\n",
       "      <td>3.3</td>\n",
       "      <td>5.7</td>\n",
       "      <td>2.5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>6.7</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146</th>\n",
       "      <td>6.3</td>\n",
       "      <td>2.5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.9</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>6.5</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>6.2</td>\n",
       "      <td>3.4</td>\n",
       "      <td>5.4</td>\n",
       "      <td>2.3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>5.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.1</td>\n",
       "      <td>1.8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>150 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     sepal length  sepal width  petal length  petal width  label\n",
       "0             5.1          3.5           1.4          0.2      0\n",
       "1             4.9          3.0           1.4          0.2      0\n",
       "2             4.7          3.2           1.3          0.2      0\n",
       "3             4.6          3.1           1.5          0.2      0\n",
       "4             5.0          3.6           1.4          0.2      0\n",
       "5             5.4          3.9           1.7          0.4      0\n",
       "6             4.6          3.4           1.4          0.3      0\n",
       "7             5.0          3.4           1.5          0.2      0\n",
       "8             4.4          2.9           1.4          0.2      0\n",
       "9             4.9          3.1           1.5          0.1      0\n",
       "10            5.4          3.7           1.5          0.2      0\n",
       "11            4.8          3.4           1.6          0.2      0\n",
       "12            4.8          3.0           1.4          0.1      0\n",
       "13            4.3          3.0           1.1          0.1      0\n",
       "14            5.8          4.0           1.2          0.2      0\n",
       "15            5.7          4.4           1.5          0.4      0\n",
       "16            5.4          3.9           1.3          0.4      0\n",
       "17            5.1          3.5           1.4          0.3      0\n",
       "18            5.7          3.8           1.7          0.3      0\n",
       "19            5.1          3.8           1.5          0.3      0\n",
       "20            5.4          3.4           1.7          0.2      0\n",
       "21            5.1          3.7           1.5          0.4      0\n",
       "22            4.6          3.6           1.0          0.2      0\n",
       "23            5.1          3.3           1.7          0.5      0\n",
       "24            4.8          3.4           1.9          0.2      0\n",
       "25            5.0          3.0           1.6          0.2      0\n",
       "26            5.0          3.4           1.6          0.4      0\n",
       "27            5.2          3.5           1.5          0.2      0\n",
       "28            5.2          3.4           1.4          0.2      0\n",
       "29            4.7          3.2           1.6          0.2      0\n",
       "..            ...          ...           ...          ...    ...\n",
       "120           6.9          3.2           5.7          2.3      2\n",
       "121           5.6          2.8           4.9          2.0      2\n",
       "122           7.7          2.8           6.7          2.0      2\n",
       "123           6.3          2.7           4.9          1.8      2\n",
       "124           6.7          3.3           5.7          2.1      2\n",
       "125           7.2          3.2           6.0          1.8      2\n",
       "126           6.2          2.8           4.8          1.8      2\n",
       "127           6.1          3.0           4.9          1.8      2\n",
       "128           6.4          2.8           5.6          2.1      2\n",
       "129           7.2          3.0           5.8          1.6      2\n",
       "130           7.4          2.8           6.1          1.9      2\n",
       "131           7.9          3.8           6.4          2.0      2\n",
       "132           6.4          2.8           5.6          2.2      2\n",
       "133           6.3          2.8           5.1          1.5      2\n",
       "134           6.1          2.6           5.6          1.4      2\n",
       "135           7.7          3.0           6.1          2.3      2\n",
       "136           6.3          3.4           5.6          2.4      2\n",
       "137           6.4          3.1           5.5          1.8      2\n",
       "138           6.0          3.0           4.8          1.8      2\n",
       "139           6.9          3.1           5.4          2.1      2\n",
       "140           6.7          3.1           5.6          2.4      2\n",
       "141           6.9          3.1           5.1          2.3      2\n",
       "142           5.8          2.7           5.1          1.9      2\n",
       "143           6.8          3.2           5.9          2.3      2\n",
       "144           6.7          3.3           5.7          2.5      2\n",
       "145           6.7          3.0           5.2          2.3      2\n",
       "146           6.3          2.5           5.0          1.9      2\n",
       "147           6.5          3.0           5.2          2.0      2\n",
       "148           6.2          3.4           5.4          2.3      2\n",
       "149           5.9          3.0           5.1          1.8      2\n",
       "\n",
       "[150 rows x 5 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x2c56f7f64e0>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X+UXWV97/H3xzCaVIFcIVbIBINic+VXCUQQ40UFWzSkgQvIj1VtI1y59aLoouISaxGxV1CsUsq6WBArijcYKYYfClSh+BtwApgoiGLFZgZuiUECaIAQvvePvedkcjgzc/bMec7Ze5/Pa61ZM3ufffZ8n31gvtn7eb7Po4jAzMwM4Hm9DsDMzMrDScHMzBqcFMzMrMFJwczMGpwUzMyswUnBzMwanBTMzKzBScHMzBqcFMzMrGG71L9A0gxgCBiJiKVNry0HzgdG8l0XRcTnJjrfzjvvHPPnz08QqZlZfa1evfo3ETFnsuOSJwXgvcC9wA7jvP6ViHh3uyebP38+Q0NDHQnMzKxfSPp1O8clfXwkaRA4ApjwX/9mZlYOqfsULgA+ADw7wTHHSFoj6SpJ81odIOkUSUOShtavX58kUDMzS5gUJC0FHo6I1RMcdh0wPyL2Bb4FXN7qoIi4JCIWRcSiOXMmfSRmZmZTlLJPYTGwTNISYCawg6QrIuJtowdExIYxx18KfCJhPGZm07J582aGh4d58sknex3KuGbOnMng4CADAwNTen+ypBARZwJnAkh6A/D+sQkh379LRDyUby4j65A2Myul4eFhtt9+e+bPn4+kXofzHBHBhg0bGB4eZvfdd5/SObpepyDpHEnL8s3TJP1U0o+B04Dl3Y7HzKxdTz75JDvttFMpEwKAJHbaaadp3cl0Y0gqEXErcGv+81lj9jfuJszqZtVdI5x/0308+Ogmdp09izMOX8BRC+f2OiybprImhFHTja8rScGs36y6a4Qzr17Lps1bABh5dBNnXr0WwInBSs3TXJglcP5N9zUSwqhNm7dw/k339Sgiq4sbb7yRBQsWsMcee3Deeed1/PxOCmYJPPjopkL7zdqxZcsWTj31VG644QbuueceVqxYwT333NPR3+HHR2YJ7Dp7FiMtEsCus2f1IBrrlU73K91xxx3ssccevPzlLwfghBNO4JprrmHPPffsVMi+UzBL4YzDFzBrYMY2+2YNzOCMwxf0KCLrttF+pZFHNxFs7VdaddfIpO8dz8jICPPmbZ34YXBwkJGRqZ+vFScFswSOWjiXc4/eh7mzZyFg7uxZnHv0Pu5k7iMp+pUi4jn7Oj0ayo+PzBI5auFcJ4E+lqJfaXBwkHXr1jW2h4eH2XXXXad8vlZ8p2BmlsB4/UfT6Vd69atfzS9+8Qt+9atf8fTTT3PllVeybNmyyd9YgJOCmVkCKfqVtttuOy666CIOP/xwXvWqV3Hcccex1157TTfUbX9HR89mZmbA1iLFTle1L1myhCVLlnQixJacFMzMEqliv5IfH5mZWYOTgpmZNTgpmJlZg5OCmZk1OCmYmVmDk4L1vVV3jbD4vFvY/YNfZ/F5t0xrbhqz1E466SRe8pKXsPfeeyc5v5OC9bUUk5aZpbR8+XJuvPHGZOd3UrC+5sVwLKk1K+Eze8PZs7Pva1ZO+5SHHHIIL37xizsQXGsuXrO+5sVwLJk1K+G602Bz/t/SxnXZNsC+x/Uurkn4TsH6WopJy8wAuPmcrQlh1OZN2f4Sc1KwvubFcCyZjcPF9peEHx9ZX0s1aZkZOw5mj4xa7S8xJwXre1WctMwq4LCztu1TABiYle2fhhNPPJFbb72V3/zmNwwODvLRj36Uk08+eZrBbuWkYD3T6UXNzUpltDP55nOyR0Y7DmYJYZqdzCtWrOhAcONzUrCeGK0PGB0OOlofADgxWH3se1ypRxq14o5m6wnXB5iVk5OC9YTrA6yqIqLXIUxouvE5KVhPuD7AqmjmzJls2LChtIkhItiwYQMzZ86c8jncp2A9ccbhC7bpUwDXB1j5DQ4OMjw8zPr163sdyrhmzpzJ4ODUh706KVhPuD7AqmhgYIDdd9+912EklTwpSJoBDAEjEbG06bUXAF8EDgA2AMdHxAOpY7JycH2AWfl0407hvcC9wA4tXjsZ+G1E7CHpBOATwPFdiMmsVFyzYWWRtKNZ0iBwBPC5cQ45Erg8//kq4DBJShmTWdl4TQcrk9Sjjy4APgA8O87rc4F1ABHxDLAR2ClxTGal4poNK5NkSUHSUuDhiFg90WEt9j1nrJekUyQNSRoqc6+/2VS4ZsPKJOWdwmJgmaQHgCuBQyVd0XTMMDAPQNJ2wI7AI80niohLImJRRCyaM2dOwpDNus81G1YmyZJCRJwZEYMRMR84AbglIt7WdNi1wF/mPx+bH1POqhCzRLymg5VJ1+sUJJ0DDEXEtcBlwJck3U92h3BCt+Mx6zXXbFiZqGr/MF+0aFEMDQ31Ogwzs0qRtDoiFk12nCuarXY+vGotK25fx5YIZkiceNA8/u6ofXodllklOClYrXx41VquuO0/GttbIhrbTgxmk/MsqVYrK25vsSbuBPvNbFtOClYrW8bpIxtvv5lty0nBamXGOLOkjLffzLblpGC1cuJB8wrtN7NtuaPZamW0M9mjj8ymxnUKZmZ9wHUK1hN/fukP+f4vt05ftfgVL+bL7zy4hxH1jtdIsCpyn4J1THNCAPj+Lx/hzy/9YY8i6h2vkWBV5aRgHdOcECbbX2deI8GqyknBLAGvkWBV5aRgloDXSLCqclKwjln8ihcX2l9nXiPBqspJwTrmy+88+DkJoF9HHx21cC7nHr0Pc2fPQsDc2bM49+h9PPrISs91CmZmfcB1CtYTqcbmFzmv6wPMps5JwTpmdGz+6FDM0bH5wLT+KBc5b6oYzPqF+xSsY1KNzS9yXtcHmE2Pk4J1TKqx+UXO6/oAs+lxUrCOSTU2v8h5XR9gNj1OCtYxqcbmFzmv6wPMpscdzdYxox25nR75U+S8qWIw6xeuUzAz6wOuUyipMoyhLxpDGWI2s+5wUuiiMoyhLxpDGWI2s+5xR3MXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPk0IXlWEMfdEYyhCzmXWPO5q7qAxj6IvGUIaYzax7ktUpSJoJfAd4AVnyuSoiPtJ0zHLgfGAk33VRRHxuovO6TsHMrLgy1Ck8BRwaEU9IGgC+J+mGiLit6bivRMS7E8Zh0/ThVWtZcfs6tkQwQ+LEg+bxd0ftM+1jy1L/UJY4zMpg0qQg6QXAMcD8scdHxDkTvS+yW5An8s2B/Kta5dPGh1et5Yrb/qOxvSWisd38x77IsWWpfyhLHGZl0U5H8zXAkcAzwO/GfE1K0gxJdwMPA9+MiNtbHHaMpDWSrpI0r824rUtW3L6u7f1Fji1L/UNZ4jAri3YeHw1GxJuncvKI2ALsJ2k28DVJe0fET8Ycch2wIiKekvRXwOXAoc3nkXQKcArAbrvtNpVQbIq2jNPn1Gp/kWPLUv9QljjMyqKdO4UfSGr9ULhNEfEocCvw5qb9GyLiqXzzUuCAcd5/SUQsiohFc+bMmU4oVtAMqe39RY4tS/1DWeIwK4txk4KktZLWAK8D7pR0X/6YZ3T/hCTNye8QkDQLeBPws6ZjdhmzuQy4dyqNsHROPKj1E71W+4scW5b6h7LEYVYWEz0+WjrNc+8CXC5pBlnyWRkR10s6BxiKiGuB0yQtI+uveARYPs3faR022kHczoiiIseWpf6hLHGYlcWkdQqSvhQRb59sX7e4TsHMrLhO1ins1XTiGYzz7N8ml2pMfJH6gJTnLtK+Kl6LylmzEm4+BzYOw46DcNhZsO9xvY7KSmzcpCDpTOBDwCxJj43uBp4GLulCbLWTakx8kfqAlOcu0r4qXovKWbMSrjsNNucjqTauy7bBicHGNW5Hc0ScGxHbA+dHxA751/YRsVNEnNnFGGsj1Zj4IvUBKc9dpH1VvBaVc/M5WxPCqM2bsv1m45joTmH//Mevjvm5ISLuTBZVTaUaE1+kPiDluYu0r4rXonI2Dhfbb8bEfQp/n3+fCSwCfkz2+Ghf4HayoapWwK6zZzHS4o/edMfEz5Ba/tEbr24g1bmLtK+K16JydhzMHhm12m82jokeH70xIt4I/BrYPy8eOwBYCNzfrQDrJNWY+CL1ASnPXaR9VbwWlXPYWTDQlGQHZmX7zcbRzuij/xoRa0c3IuInkvZLGFNtpRoTX6Q+IOW5i7SviteickY7kz36yApop05hBdkEeFeQzXL6NuBFEXFi+vCey3UKZmbFdbJO4R3Au4D35tvfAS6eRmxWMWWoPbCKc71EZUyaFCLiSeAz+Zf1mTLUHljFuV6iUiaaEG9l/n1tPhHeNl/dC9F6qQy1B1ZxrpeolInuFEYfF013YjyrsDLUHljFuV6iUiYakvpQ/uNhwPMj4tdjv7oTnvVakfUGvDaBtTReXYTrJUqpnUV25gP/JOmXklZKeo+HpPaPMtQeWMW5XqJS2uloPgsaC+W8EzgDuACYMdH7rB7KUHtgFed6iUppp07hw8Bi4EXAXcD3gO+OebzUVa5TMDMrrpN1CkeTrYz2deDbwG35MNVaSzXevsh5y7IugGsPSqbuY/7r3r4ienAt2nl8tL+k7ckmwPsT4FJJ/xkRtZ0QL9V4+yLnLcu6AK49KJm6j/mve/uK6NG1mLSjWdLeZFNb/CVwPDAM3JIsohJINd6+yHnLsi6Aaw9Kpu5j/uveviJ6dC3aeXz0CbLHRhcCP4qIzUkjKoFU4+2LnLcs6wK49qBk6j7mv+7tK6JH12LSO4WIOCIiPhkRP+iHhADpxtsXOe948/93e10A1x6UTN3H/Ne9fUX06Fq0U6fQd1KNty9y3rKsC+Dag5Kp+5j/ureviB5di3YeH/WdVOPti5y3LOsCuPagZOo+5r/u7SuiR9di0jqFsnGdgplZcdOuU5B0HdmiOi1FxLIpxtbXXP9gVhHXnw6rvwCxBTQDDlgOSz89/fOWvA5josdHn+paFH3C9Q9mFXH96TB02dbt2LJ1ezqJoQJ1GBPNkvrtib66GWRduP7BrCJWf6HY/nZVoA5j0o5mSa8EzgX2BGaO7o+IlyeMq5Zc/2BWEbGl2P52VaAOo50hqf9MtibzM8AbgS8CX0oZVF25/sGsIjTOJNDj7W9XBeow2kkKsyLiZrKRSr+OiLOBQ9OGVU+ufzCriAOWF9vfrgrUYbRTp/CkpOcBv5D0bmAEeEnasOrJ9Q9mFTHamdzp0UcVqMNoZz2FVwP3ArOBjwE7Ap+MiNvSh/dcrlMwMyuuY+spRMSP8hM+DzgtIh5vM4CZwHeAF+S/56qI+EjTMS8g66M4ANgAHB8RD7Rz/qKK1gdUbQ2BIrUHdb8WSceBFxm7niqOlO0r+Rj6aSnatjpfiwm0M/poEVln8/b59kbgpIhYPclbnwIOjYgnJA0A35N0Q9MdxsnAbyNiD0knkM3IevxUGjKRovUBVVtDoEjtQd2vRdJx4EXGrqeKI2X7KjCGfsqKtq3O12IS7XQ0fx74XxExPyLmA6eSJYkJReaJfHMg/2p+VnUkcHn+81XAYVLnh8EUrQ+o2hoCRWoP6n4tko4DLzJ2PVUcKdtXgTH0U1a0bXW+FpNoJyk8HhHfHd2IiO8B7T5CmiHpbuBh4JsRcXvTIXOBdfl5nwE2Aju1OM8pkoYkDa1fv76dX72NovUBVVtDoEjtQd2vRdJx4EXGrqeKI2X7KjCGfsqKtq3O12IS7SSFOyT9k6Q3SHq9pP8D3Cppf0n7T/TGiNgSEfsBg8CB+SpuY7W6K3jOX7KIuCQiFkXEojlz5rQR8raK1gdUbQ2BIrUHdb8WSceBFxm7niqOlO2rwBj6KSvatjpfi0m0kxT2A/4I+AhwNvAq4LXA39Pm/EgR8ShwK/DmppeGgXkAkrYjG9n0SDvnLKJofUDV1hAoUntQ92uRdBx4kbHrqeJI2b4KjKGfsqJtq/O1mEQ7o4/eOJUTS5oDbI6IRyXNAt5E1pE81rVkaz//EDgWuCUSzOVdtD6gamsIFKk9qPu1SDoOvMjY9VRxpGxfBcbQT1nRttX5WkyinTqFPwQ+DuwaEW+RtCdwcERcNsn79iXrRJ5BdkeyMiLOkXQOMBQR1+bDVr8ELCS7QzghIv59ovO6TsHMrLiO1SkAXyAbbfQ3+fbPga8AEyaFiFhD9se+ef9ZY35+EnhrGzGYmVkXtNOnsHNErASehcYooWlOFVh+q+4aYfF5t7D7B7/O4vNuYdVdI70OycpgzUr4zN5w9uzs+5qVnTk2laIxlKF9VTtvzbRzp/A7STuRjwqS9BqyoaO1VbmCLeuOIgVNZSh+SlmwVbXivDJ8HhXRzp3C6WQdwq+Q9H2yaSnekzSqHqtcwZZ1R5GCpjIUP6Us2KpacV4ZPo+KaGf00Z2SXg8sIKsruC8iNiePrIcqV7Bl3VGkoKkMxU8pC7aqVpxXhs+jIia9U5D0VrI1FX4KHAV8ZbKitaqrXMGWdUeRgqYyFD+lLNiqWnFeGT6Pimjn8dHfRsTjkl4HHE42zPTitGH1VuUKtqw7ihQ0laH4KWXBVtWK88rweVREO0lh9OH6EcDFEXEN8Px0IfXeUQvncu7R+zB39iwEzJ09i3OP3sedzP1u3+Pgzy6EHecByr7/2YWtOyqLHFuGeIsen6p9VTtvDbVTvHY92WprbyJb92ATcEdE/HH68J7LxWtmZsV1snjtOLI5iz6VT1mxC3DGdAM0q70iC/KURdViLstCOGWJowPaGX30e+DqMdsPAQ+lDMqs8oosyFMWVYu5LLUHZYmjQ9rpUzCzooosyFMWVYu5LLUHZYmjQ5wUzFIosiBPWVQt5rLUHpQljg5xUjBLociCPGVRtZjLUntQljg6xEnBLIUiC/KURdViLkvtQVni6BAnBbMUln4aFp289V/ZmpFtl7HDdlTVYi5L7UFZ4uiQSesUysZ1CmZmxXWyTsEsjSqO7U4Vc6r6gCpeY+spJwXrjSqO7U4Vc6r6gCpeY+s59ylYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPOSlYb1RxbHeqmFPVB1TxGlvPuU7BzKwPtFun4DsFszUr4TN7w9mzs+9rVnb/vKliMCvIdQrW31KN5S9yXtcTWIn4TsH6W6qx/EXO63oCKxEnBetvqcbyFzmv6wmsRJwUrL+lGstf5LyuJ7AScVKw/pZqLH+R87qewErEScH6W6qx/EXO63oCK5FkdQqS5gFfBF4KPAtcEhH/0HTMG4BrgF/lu66OiAl711ynYGZWXBnWU3gG+OuIuFPS9sBqSd+MiHuajvtuRCxNGId1UxXn7y8ScxXbVwa+bpWRLClExEPAQ/nPj0u6F5gLNCcFq4sqjrd3PUF6vm6V0pU+BUnzgYXA7S1ePljSjyXdIGmvbsRjiVRxvL3rCdLzdauU5BXNkl4E/Avwvoh4rOnlO4GXRcQTkpYAq4BXtjjHKcApALvttlviiG3Kqjje3vUE6fm6VUrSOwVJA2QJ4csRcXXz6xHxWEQ8kf/8DWBA0s4tjrskIhZFxKI5c+akDNmmo4rj7V1PkJ6vW6UkSwqSBFwG3BsRLecAlvTS/DgkHZjHsyFVTJZYFcfbu54gPV+3Skn5+Ggx8HZgraS7830fAnYDiIjPAscC75L0DLAJOCGqNpe3bTXaaVilUSZFYq5i+8rA161SvJ6CmVkfKEOdgpWVx4xv6/rTYfUXILZkq54dsHz6q56ZVZSTQr/xmPFtXX86DF22dTu2bN12YrA+5LmP+o3HjG9r9ReK7TerOSeFfuMx49uKLcX2m9Wck0K/8ZjxbWlGsf1mNeek0G88ZnxbBywvtt+s5pwU+o3n7t/W0k/DopO33hloRrbtTmbrU65TMDPrA65T6KJVd41w/k338eCjm9h19izOOHwBRy2c2+uwOqfudQ11b18Z+BpXhpPCNK26a4Qzr17Lps3ZaJWRRzdx5tVrAeqRGOpe11D39pWBr3GluE9hms6/6b5GQhi1afMWzr/pvh5F1GF1r2uoe/vKwNe4UpwUpunBRzcV2l85da9rqHv7ysDXuFKcFKZp19mzCu2vnLrXNdS9fWXga1wpTgrTdMbhC5g1sG2h06yBGZxx+IIeRdRhda9rqHv7ysDXuFLc0TxNo53JtR19VPe58OvevjLwNa4U1ymYmfWBdusU/PjIrM7WrITP7A1nz86+r1lZjXNbz/jxkVldpawPcO1BbflOwayuUtYHuPagtpwUzOoqZX2Aaw9qy0nBrK5S1ge49qC2nBTM6iplfYBrD2rLScGsrlKuneF1OWrLdQpmZn3AdQpmZlaYk4KZmTU4KZiZWYOTgpmZNTgpmJlZg5OCmZk1OCmYmVmDk4KZmTUkSwqS5kn6N0n3SvqppPe2OEaSLpR0v6Q1kvZPFY9Ng+fNN+sbKddTeAb464i4U9L2wGpJ34yIe8Yc8xbglfnXQcDF+XcrC8+bb9ZXkt0pRMRDEXFn/vPjwL1A88LFRwJfjMxtwGxJu6SKyabA8+ab9ZWu9ClImg8sBG5vemkusG7M9jDPTRxIOkXSkKSh9evXpwrTWvG8+WZ9JXlSkPQi4F+A90XEY80vt3jLc2boi4hLImJRRCyaM2dOijBtPJ4336yvJE0KkgbIEsKXI+LqFocMA/PGbA8CD6aMyQryvPlmfSXl6CMBlwH3RsSnxznsWuAv8lFIrwE2RsRDqWKyKfC8+WZ9JeXoo8XA24G1ku7O930I2A0gIj4LfANYAtwP/B54R8J4bKr2Pc5JwKxPJEsKEfE9WvcZjD0mgFNTxWBmZsW4otnMzBqcFMzMrMFJwczMGpwUzMyswUnBzMwanBTMzKzBScHMzBqUlQpUh6T1wK97Hcc4dgZ+0+sgEnL7qqvObQO3rx0vi4hJJ4+rXFIoM0lDEbGo13Gk4vZVV53bBm5fJ/nxkZmZNTgpmJlZg5NCZ13S6wASc/uqq85tA7evY9ynYGZmDb5TMDOzBieFKZA0Q9Jdkq5v8dpySesl3Z1//Y9exDgdkh6QtDaPf6jF65J0oaT7Ja2RtH8v4pyKNtr2Bkkbx3x+lVpiTtJsSVdJ+pmkeyUd3PR6ZT87aKt9lf38JC0YE/fdkh6T9L6mY5J/fikX2amz9wL3AjuM8/pXIuLdXYwnhTdGxHjjot8CvDL/Ogi4OP9eFRO1DeC7EbG0a9F01j8AN0bEsZKeD/xB0+tV/+wmax9U9POLiPuA/SD7hycwAnyt6bDkn5/vFAqSNAgcAXyu17H00JHAFyNzGzBb0i69DqrfSdoBOIRsGVwi4umIeLTpsMp+dm22ry4OA34ZEc2Fusk/PyeF4i4APgA8O8Exx+S3dldJmteluDopgH+VtFrSKS1enwusG7M9nO+rgsnaBnCwpB9LukHSXt0MbppeDqwH/jl/vPk5SS9sOqbKn1077YPqfn5jnQCsaLE/+efnpFCApKXAwxGxeoLDrgPmR8S+wLeAy7sSXGctjoj9yW5VT5V0SNPrrZZZrcowtsnadifZdAB/DPwjsKrbAU7DdsD+wMURsRD4HfDBpmOq/Nm1074qf34A5I/FlgFfbfVyi30d/fycFIpZDCyT9ABwJXCopCvGHhARGyLiqXzzUuCA7oY4fRHxYP79YbJnmgc2HTIMjL0DGgQe7E500zNZ2yLisYh4Iv/5G8CApJ27HujUDAPDEXF7vn0V2R/R5mMq+dnRRvsq/vmNegtwZ0T8Z4vXkn9+TgoFRMSZETEYEfPJbu9uiYi3jT2m6fneMrIO6cqQ9EJJ24/+DPwp8JOmw64F/iIfCfEaYGNEPNTlUAtrp22SXipJ+c8Hkv0/sqHbsU5FRPw/YJ2kBfmuw4B7mg6r5GcH7bWvyp/fGCfS+tERdOHz8+ijDpB0DjAUEdcCp0laBjwDPAIs72VsU/CHwNfy/6+2A/5vRNwo6a8AIuKzwDeAJcD9wO+Bd/Qo1qLaaduxwLskPQNsAk6IalV4vgf4cv4I4t+Bd9Tksxs1Wfsq/flJ+gPgT4D/OWZfVz8/VzSbmVmDHx+ZmVmDk4KZmTU4KZiZWYOTgpmZNTgpmJlZg5OCWUH5TJytZshtub8Dv+8oSXuO2b5VUm3XI7beclIwK7+jgD0nPcqsA5wUrHbyyuWv55Oi/UTS8fn+AyR9O58M76bR6vP8X94XSPpBfvyB+f4D83135d8XTPR7W8TweUk/yt9/ZL5/uaSrJd0o6ReSPjnmPSdL+nkez6WSLpL0WrLK+POVzbH/ivzwt0q6Iz/+v3Xo0pm5otlq6c3AgxFxBICkHSUNkE2QdmRErM8Txf8GTsrf88KIeG0+Qd7ngb2BnwGHRMQzkt4EfBw4ps0Y/oZsGpSTJM0G7pD0rfy1/YCFwFPAfZL+EdgC/C3ZXD6PA7cAP46IH0i6Frg+Iq7K2wOwXUQcKGkJ8BHgTVO5UGbNnBSsjtYCn5L0CbI/pt+VtDfZH/pv5n9UZwBj54xZARAR35G0Q/6HfHvgckmvJJuJcqBADH9KNnni+/PtmcBu+c83R8RGAEn3AC8Ddga+HRGP5Pu/CvzRBOe/Ov++GphfIC6zCTkpWO1ExM8lHUA2R8y5kv6VbEbUn0bEweO9rcX2x4B/i4j/Lmk+cGuBMAQck6+mtXWndBDZHcKoLWT/H7aaEnkio+cYfb9ZR7hPwWpH0q7A7yPiCuBTZI9k7gPmKF/TV9KAtl2AZbTf4XVkM09uBHYkWxIRik9seBPwnjEzdi6c5Pg7gNdL+i+StmPbx1SPk921mCXnf2FYHe1D1jH7LLAZeFdEPC3pWOBCSTuS/bd/AfDT/D2/lfQDsnW3R/sZPkn2+Oh0smf8RXwsP/+aPDE8AIy7bnBEjEj6OHA72fz49wAb85evBC6VdBrZLKBmyXiWVOt7km4F3h8RQz2O40UR8UR+p/A14PMR0bxwu1lSfnxkVh5nS7qbbOGfX1HBpSSt+nynYGZmDb45Mg6qAAAAJklEQVRTMDOzBicFMzNrcFIwM7MGJwUzM2twUjAzswYnBTMza/j/pgzYEDEDapEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x2c55bd57e48>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
    "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
    "plt.xlabel('sepal length')\n",
    "plt.ylabel('sepal width')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.array(df.iloc[:100, [0, 1, -1]])\n",
    "X, y = data[:,:-1], data[:,-1]\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "class KNN:\n",
    "    def __init__(self, X_train, y_train, n_neighbors=3, p=2):\n",
    "        \"\"\"\n",
    "        parameter: n_neighbors 临近点个数\n",
    "        parameter: p 距离度量\n",
    "        \"\"\"\n",
    "        self.n = n_neighbors\n",
    "        self.p = p\n",
    "        self.X_train = X_train\n",
    "        self.y_train = y_train\n",
    "\n",
    "    def predict(self, X):\n",
    "        # 取出n个点\n",
    "        knn_list = []\n",
    "        for i in range(self.n):\n",
    "            dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
    "            knn_list.append((dist, self.y_train[i]))\n",
    "\n",
    "        for i in range(self.n, len(self.X_train)):\n",
    "            max_index = knn_list.index(max(knn_list, key=lambda x: x[0]))\n",
    "            dist = np.linalg.norm(X - self.X_train[i], ord=self.p)\n",
    "            if knn_list[max_index][0] > dist:\n",
    "                knn_list[max_index] = (dist, self.y_train[i])\n",
    "\n",
    "        # 统计\n",
    "        knn = [k[-1] for k in knn_list]\n",
    "        count_pairs = Counter(knn)\n",
    "#         max_count = sorted(count_pairs, key=lambda x: x)[-1]\n",
    "        max_count = sorted(count_pairs.items(), key=lambda x: x[1])[-1][0]\n",
    "        return max_count\n",
    "\n",
    "    def score(self, X_test, y_test):\n",
    "        right_count = 0\n",
    "        n = 10\n",
    "        for X, y in zip(X_test, y_test):\n",
    "            label = self.predict(X)\n",
    "            if label == y:\n",
    "                right_count += 1\n",
    "        return right_count / len(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "clf = KNN(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.95"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test Point: 1.0\n"
     ]
    }
   ],
   "source": [
    "test_point = [6.0, 3.0]\n",
    "print('Test Point: {}'.format(clf.predict(test_point)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x2c56f8b0d68>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XucVXW9//HXx3ESUIRCOgkDjZfipwJxGW/hJS8dvJAa3tOK9Bc//VlWJiXZ8ULH1PCkx3ykB9MyRZQ4hJcSzQtpmdgAI6ho4kllBn9HQkER9AB+fn+sNZthu2dmr9l77b3W2u/n4zGPmfXda3/ns9bW+bDW+n6+X3N3REREALardgAiIpIcSgoiIpKjpCAiIjlKCiIikqOkICIiOUoKIiKSo6QgIiI5SgoiIpKjpCAiIjnbx/0LzKwOaAba3H1C3muTgOlAW9h0g7v/oqv+dtllF29sbIwhUhGR7Fq0aNE/3H1gd/vFnhSAbwHLgZ07ef1ud/9GsZ01NjbS3NxclsBERGqFmb1azH6x3j4yswbgWKDLf/2LiEgyxP1M4Trge8AHXexzopktNbM5Zjak0A5mNtnMms2sefXq1bEEKiIiMSYFM5sAvOHui7rY7T6g0d1HAg8DtxXayd1nuHuTuzcNHNjtLTEREemhOJ8pjAOOM7NjgF7AzmZ2h7uf2b6Du6/psP/NwNUxxiMiCbRp0yZaW1t57733qh1KJvTq1YuGhgbq6+t79P7YkoK7TwWmApjZ54ALOyaEsH1Xd3893DyO4IG0iNSQ1tZW+vbtS2NjI2ZW7XBSzd1Zs2YNra2t7Lbbbj3qo+J1CmY2zcyOCzfPN7PnzOwZ4HxgUqXjEZHqeu+99xgwYIASQhmYGQMGDCjpqqsSQ1Jx9wXAgvDnSzq0564mRLJm3pI2pj/4IqvWbmRQ/95MGT+ME0YPrnZYiaSEUD6lnsuKJAWRWjNvSRtT5y5j46YtALSt3cjUucsAlBgk0TTNhUgMpj/4Yi4htNu4aQvTH3yxShGJFEdJQSQGq9ZujNQuxZs5ExobYbvtgu8zZ5bW39q1a/n5z3/eo/ded911bNiwobQA8lxyySU8/PDDXe6zYMECnnzyybL+3nZKCiIxGNS/d6R2Kc7MmTB5Mrz6KrgH3ydPLi0xJC0pTJs2jSOPPLLLfZQURFJmyvhh9K6v26atd30dU8YPq1JE2XDxxZD/N3jDhqC9py666CJefvllRo0axZQpU5g+fTr77rsvI0eO5NJLLwXg3Xff5dhjj+Uzn/kMw4cP5+677+b6669n1apVHHbYYRx22GGd9r/TTjvx3e9+lzFjxnDEEUfQPitDS0sLBxxwACNHjuSLX/wib731FgCTJk1izpw5QDDX26WXXsqYMWMYMWIEL7zwAq+88go33XQT1157LaNGjeKJJ57o+cEXoKQgEoMTRg/myokjGNy/NwYM7t+bKyeO0EPmEr32WrT2Ylx11VXssccetLS08PnPf56XXnqJp59+mpaWFhYtWsTjjz/O/PnzGTRoEM888wzPPvssRx11FOeffz6DBg3iscce47HHHuu0/3fffZcxY8awePFiDj30UC6//HIAvvKVr3D11VezdOlSRowYkWvPt8suu7B48WLOPfdcrrnmGhobGznnnHP4zne+Q0tLCwcffHDPD74AjT4SickJowcrCZTZ0KHBLaNC7eXw0EMP8dBDDzF69GgA1q9fz0svvcTBBx/MhRdeyPe//30mTJgQ6Q/xdtttx6mnngrAmWeeycSJE1m3bh1r167l0EMPBeCrX/0qJ598csH3T5w4EYCxY8cyd+7cUg6vuHhj/w0iImVyxRXQp8+2bX36BO3l4O5MnTqVlpYWWlpaWLFiBWeffTaf/vSnWbRoESNGjGDq1KlMmzatx78jah3BDjvsAEBdXR2bN2/u8e8tlpKCiKTGGWfAjBnwyU+CWfB9xoygvaf69u3LO++8A8D48eO59dZbWb9+PQBtbW288cYbrFq1ij59+nDmmWdy4YUXsnjx4g+9tzMffPBB7hnBnXfeyUEHHUS/fv346Ec/mnsecPvtt+euGqLGXG66fSQiqXLGGaUlgXwDBgxg3LhxDB8+nKOPPpovfelLHHjggUDwkPiOO+5gxYoVTJkyhe222476+npuvPFGACZPnszRRx/Nrrvu2ulzhR133JHnnnuOsWPH0q9fP+6++24AbrvtNs455xw2bNjA7rvvzi9/+cuiY/7CF77ASSedxD333MPPfvazsj5XMHcvW2eV0NTU5Fp5TSQ7li9fzl577VXtMGKz00475a48KqXQOTWzRe7e1N17dftIRERydPtIRKQM9t9/f95///1t2m6//faKXyWUSklBRKQMFi5cWO0QykK3j0REJEdJQUREcnT7SGqeFsMR2UpXClLT2hfDaVu7EWfrYjjzlrRVOzSpoPnz5zNs2DD23HNPrrrqqmqHU1VKClLTtBiObNmyhfPOO48HHniA559/nlmzZvH8889XO6yq0e0jqWlaDCd9yn277+mnn2bPPfdk9913B+C0007jnnvuYe+99y5XyKmiKwWpaVoMJ13iuN3X1tbGkCFDctsNDQ20tdXu7UMlBalpWgwnXeK43Vdoqp+oM5lmiW4fSU1rv+2g0UfpEMftvoaGBlauXJnbbm1tZdCgQT3uL+2UFKTmaTGc9BjUvzdtBRJAKbf79t13X1566SX+/ve/M3jwYO666y7uvPPOUsJMNd0+kqqZt6SNcVc9ym4X/Y5xVz2qYaDSrThu922//fbccMMNjB8/nr322otTTjmFffbZp9RQU0tXClIV7Q8M2+8Ptz8wBPSvdulUXLf7jjnmGI455phyhJh6SgpSFV09MFRSkK7odl+8dPtIqkL1ASLJpKQgVaH6AJFkUlKQqlB9gEgy6ZmCVIXqA0SSKfakYGZ1QDPQ5u4T8l7bAfg1MBZYA5zq7q/EHZMkgx4YiiRPJW4ffQtY3slrZwNvufuewLXA1RWIRyRxVLNRXWeddRYf//jHGT58eLVDqbpYk4KZNQDHAr/oZJfjgdvCn+cAR1gtTzoiNUlrOlTfpEmTmD9/frXDSIS4rxSuA74HfNDJ64OBlQDuvhlYBwyIOSaRRNGaDhEtnQ3XDofL+gffl84uuctDDjmEj33sY2UILv1iSwpmNgF4w90XdbVbgbYPTVloZpPNrNnMmlevXl22GEWSQDUbESydDfedD+tWAh58v+/8siQGCcR5pTAOOM7MXgHuAg43szvy9mkFhgCY2fZAP+DN/I7cfYa7N7l708CBA2MMWaTyVLMRwSPTYFNesty0MWiXsogtKbj7VHdvcPdG4DTgUXc/M2+3e4Gvhj+fFO7z4cnNRTJMNRsRrGuN1i6RVbxOwcymAc3ufi9wC3C7ma0guEI4rdLxiFSbajYi6NcQ3joq0C5lUZGk4O4LgAXhz5d0aH8POLkSMYgkmWo2inTEJcEzhI63kOp7B+0lOP3001mwYAH/+Mc/aGho4PLLL+fss88uMdh0UkWzZM4P5y1j1sKVbHGnzozT9x/Cv54wotphSTmMPCX4/si04JZRv4YgIbS399CsWbPKEFw2KClIpvxw3jLueOq13PYW99y2EkNGjDyl5CQgndOEeJIpsxYWuN/cRbuIbEtJQTJlSyeD1zprl2TQoMPyKfVcKilIptR1MktKZ+1Sfb169WLNmjVKDGXg7qxZs4ZevXr1uA89U5BMOX3/Ids8U+jYLsnU0NBAa2srmq2gPHr16kVDQ8+H6CopSKa0P0zW6KP0qK+vZ7fddqt2GBKytF2yNTU1eXNzc7XDEBFJFTNb5O5N3e2nKwUpqzNu/gt/fnnr9FXj9vgYM79+YBUjqp55S9pUpSypowfNUjb5CQHgzy+/yRk3/6VKEVWP1kiQtFJSkLLJTwjdtWeZ1kiQtFJSEImB1kiQtFJSEImB1kiQtFJSkLIZt0fh5Qw7a88yrZEgaaWkIGUz8+sHfigB1OrooxNGD+bKiSMY3L83Bgzu35srJ47Q6CNJPNUpiIjUANUpSFXENTY/Sr+qDxDpOSUFKZv2sfntQzHbx+YDJf1RjtJvXDGI1Ao9U5CyiWtsfpR+VR8gUholBSmbuMbmR+lX9QEipVFSkLKJa2x+lH5VHyBSGiUFKZu4xuZH6Vf1ASKl0YNmKZv2B7nlHvkTpd+4YhCpFapTEBGpAapTSKgkjKGPGkMSYhaRylBSqKAkjKGPGkMSYhaRytGD5gpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRwlhQpKwhj6qDEkIWYRqRw9aK6gJIyhjxpDEmIWkcqJrU7BzHoBjwM7ECSfOe5+ad4+k4DpQFvYdIO7/6KrflWnICISXRLqFN4HDnf39WZWD/zJzB5w96fy9rvb3b8RYxxSoh/OW8ashSvZ4k6dGafvP4R/PWFEyfsmpf4hKXGIJEG3ScHMdgBOBBo77u/u07p6nweXIOvDzfrwK13l08IP5y3jjqdey21vcc9t5/+xj7JvUuofkhKHSFIU86D5HuB4YDPwboevbplZnZm1AG8Af3D3hQV2O9HMlprZHDMbUmTcUiGzFq4suj3Kvkmpf0hKHCJJUcztowZ3P6onnbv7FmCUmfUHfmtmw9392Q673AfMcvf3zewc4Dbg8Px+zGwyMBlg6NChPQlFemhLJ8+cCrVH2Tcp9Q9JiUMkKYq5UnjSzArfFC6Su68FFgBH5bWvcff3w82bgbGdvH+Guze5e9PAgQNLCUUiqjMruj3Kvkmpf0hKHCJJ0WlSMLNlZrYUOAhYbGYvhrd52tu7ZGYDwysEzKw3cCTwQt4+u3bYPA5Y3pODkPicvn/hO3qF2qPsm5T6h6TEIZIUXd0+mlBi37sCt5lZHUHyme3u95vZNKDZ3e8Fzjez4wieV7wJTCrxd0qZtT8gLmZEUZR9k1L/kJQ4RJKi2zoFM7vd3b/cXVulqE5BRCS6ctYp7JPXcR2d3PuX7sU1Jj5KfUCcfUc5vjSei9RZOhsemQbrWqFfAxxxCYw8pdpRSYJ1mhTMbCrwA6C3mb3d3gz8DzCjArFlTlxj4qPUB8TZd5TjS+O5SJ2ls+G+82FTOJJq3cpgG5QYpFOdPmh29yvdvS8w3d13Dr/6uvsAd59awRgzI64x8VHqA+LsO8rxpfFcpM4j07YmhHabNgbtIp3o6kphTPjjbzr8nOPui2OLKqPiGhMfpT4gzr6jHF8az0XqrGuN1i5C188U/i383gtoAp4huH00ElhIMFRVIhjUvzdtBf7olTomvs6s4B+9zuoG4uo7yvGl8VykTr+G4JZRoXaRTnR1++gwdz8MeBUYExaPjQVGAysqFWCWxDUmPkp9QJx9Rzm+NJ6L1DniEqjPS7L1vYN2kU4UM/rof7n7svYNd3/WzEbFGFNmxTUmPkp9QJx9Rzm+NJ6L1Gl/mKzRRxJBMXUKswgmwLuDYJbTM4Gd3P30+MP7MNUpiIhEV846ha8B5wLfCrcfB24sITZJmSTUHkjKqV4iNbpNCu7+HnBt+CU1Jgm1B5JyqpdIla4mxJsdfl8WToS3zVflQpRqSkLtgaSc6iVSpasrhfbbRaVOjCcploTaA0k51UukSldDUl8PfzwC+Ii7v9rxqzLhSbVFWW9AaxNIQZ3VRaheIpGKWWSnEfgPM3vZzGab2Tc1JLV2JKH2QFJO9RKpUsyD5ksgt1DO14EpwHVAXVfvk2xIQu2BpJzqJVKlmDqFHwLjgJ2AJcCfgCc63F6qKNUpiIhEV846hYkEK6P9Dvgj8FQ4TDXT4hpvH6XfpKwLoNqDhMn6mP+sH18UVTgXxdw+GmNmfQkmwPs8cLOZ/be7Z3ZCvLjG20fpNynrAqj2IGGyPuY/68cXRZXORbcPms1sOMHUFl8FTgVagUdjiygB4hpvH6XfpKwLoNqDhMn6mP+sH18UVToXxdw+uprgttH1wF/dfVOsESVAXOPto/SblHUBVHuQMFkf85/144uiSuei2ysFdz/W3X/i7k/WQkKA+MbbR+m3s/n/K70ugGoPEibrY/6zfnxRVOlcFFOnUHPiGm8fpd+krAug2oOEyfqY/6wfXxRVOhfF3D6qOXGNt4/Sb1LWBVDtQcJkfcx/1o8viiqdi27rFJJGdQoiItGVXKdgZvcRLKpTkLsf18PYaprqH0RS4v4LYNGvwLeA1cHYSTDhp6X3m/A6jK5uH11TsShqhOofRFLi/gug+Zat275l63YpiSEFdRhdzZL6x66+KhlkVqj+QSQlFv0qWnuxUlCH0e2DZjP7FHAlsDfQq73d3XePMa5MUv2DSEr4lmjtxUpBHUYxQ1J/SbAm82bgMODXwO1xBpVVqn8QSQnrZBLoztqLlYI6jGKSQm93f4RgpNKr7n4ZcHi8YWWT6h9EUmLspGjtxUpBHUYxdQrvmdl2wEtm9g2gDfh4vGFlk+ofRFKi/WFyuUcfpaAOo5j1FPYFlgP9gR8B/YCfuPtT8Yf3YapTEBGJrmzrKbj7X8MOtwPOd/d3igygF/A4sEP4e+a4+6V5++xA8IxiLLAGONXdXymm/6ii1gekbQ2BKLUHWT8XsY4DjzJ2Pa44IvQ7cyZcfDG89hoMHQpXXAFnnFGevlMn6rFl+Vx0oZjRR00ED5v7htvrgLPcfVE3b30fONzd15tZPfAnM3sg7wrjbOAtd9/TzE4jmJH11J4cSFei1gekbQ2BKLUHWT8XsY4DjzJ2Pa44IvQ7cyZMngwbNgTbr74abEMniSEFY+h7LOqxZflcdKOYB823Av/X3RvdvRE4jyBJdMkD68PN+vAr/17V8cBt4c9zgCPMyj8MJmp9QNrWEIhSe5D1cxHrOPAoY9fjiiNCvxdfvDUhtNuwIWgvte/UiXpsWT4X3SgmKbzj7k+0b7j7n4BibyHVmVkL8AbwB3dfmLfLYGBl2O9mYB0woEA/k82s2cyaV69eXcyv3kbU+oC0rSEQpfYg6+ci1nHgUcauxxVHhH5fe63Afl20p2EMfY9FPbYsn4tuFJMUnjaz/zCzz5nZoWb2c2CBmY0xszFdvdHdt7j7KKAB2C9cxa2jQlcFH/pL5u4z3L3J3ZsGDhxYRMjbilofkLY1BKLUHmT9XMQ6DjzK2PW44ojQ79ChhXftrD0NY+h7LOqxZflcdKOYpDAK+DRwKXAZsBfwWeDfKHJ+JHdfCywAjsp7qRUYAmBm2xOMbHqzmD6jiFofkLY1BKLUHmT9XMQ6DjzK2PW44ojQ7xVXQJ8+27b16RO0l9p36kQ9tiyfi24UM/rosJ50bGYDgU3uvtbMegNHEjxI7uhegrWf/wKcBDzqMczlHbU+IG1rCESpPcj6uYh1HHiUsetxxRGh3/aHyUWPPkrBGPoei3psWT4X3SimTuGfgB8Dg9z9aDPbGzjQ3W/p5n0jCR4i1xFckcx292lmNg1odvd7w2GrtwOjCa4QTnP3/+qqX9UpiIhEV7Y6BeBXBKON2scs/A24G+gyKbj7UoI/9vntl3T4+T3g5CJiEBGRCijmmcIu7j4b+AByo4RKnCow+eYtaWPcVY+y20W/Y9xVjzJvSVu1Q5IkWDobrh0Ol/UPvi+dXZ594xI1hiQcX9r6zZhirhTeNbMBhKOCzOwAgqGjmZW6gi2pjCgFTUkofoqzYCsBxXmJ6DeDirlSuIDggfAeZvZngmkpvhlrVFWWuoItqYwoBU1JKH6Ks2ArAcV5ieg3g4oZfbTYzA4FhhHUFbzo7ptij6yKUlewJZURpaApCcVPcRZsJaA4LxH9ZlC3VwpmdjLBmgrPAScAd3dXtJZ2qSvYksqIUtCUhOKnOAu2ElCcl4h+M6iY20f/4u7vmNlBwHiCYaY3xhtWdaWuYEsqI0pBUxKKn+Is2EpAcV4i+s2gYpJC+831Y4Eb3f0e4CPxhVR9J4wezJUTRzC4f28MGNy/N1dOHKGHzLVu5Cnwheuh3xDAgu9fuL7wg8oo+yYh3qj7x3V8aes3g4opXrufYLW1IwnWPdgIPO3un4k/vA9T8ZqISHTlLF47hWDOomvCKSt2BaaUGqBI5kVZkCcp0hZzUhbCSUocZVDM6KMNwNwO268Dr8cZlEjqRVmQJynSFnNSag+SEkeZFPNMQUSiirIgT1KkLeak1B4kJY4yUVIQiUOUBXmSIm0xJ6X2IClxlImSgkgcoizIkxRpizkptQdJiaNMlBRE4hBlQZ6kSFvMSak9SEocZaKkIBKHCT+FprO3/ivb6oLtJD6wbZe2mJNSe5CUOMqk2zqFpFGdgohIdOWsUxCJRxrHdscVc1z1AWk8x1JVSgpSHWkc2x1XzHHVB6TxHEvV6ZmCVEcax3bHFXNc9QFpPMdSdUoKUh1pHNsdV8xx1Qek8RxL1SkpSHWkcWx3XDHHVR+QxnMsVaekINWRxrHdccUcV31AGs+xVJ2SglRHGsd2xxVzXPUBaTzHUnWqUxARqQHF1inoSkFk6Wy4djhc1j/4vnR25fuNKwaRiFSnILUtrrH8UfpVPYEkiK4UpLbFNZY/Sr+qJ5AEUVKQ2hbXWP4o/aqeQBJESUFqW1xj+aP0q3oCSRAlBaltcY3lj9Kv6gkkQZQUpLbFNZY/Sr+qJ5AEia1OwcyGAL8GPgF8AMxw93/P2+dzwD3A38Omue7e5dM11SmIiESXhPUUNgPfdffFZtYXWGRmf3D35/P2e8LdJ8QYh1RSGufvjxJzGo8vCXTeUiO2pODurwOvhz+/Y2bLgcFAflKQrEjjeHvVE8RP5y1VKvJMwcwagdHAwgIvH2hmz5jZA2a2TyXikZikcby96gnip/OWKrFXNJvZTsB/At9297fzXl4MfNLd15vZMcA84FMF+pgMTAYYOnRozBFLj6VxvL3qCeKn85YqsV4pmFk9QUKY6e5z819397fdfX348++BejPbpcB+M9y9yd2bBg4cGGfIUoo0jrdXPUH8dN5SJbakYGYG3AIsd/eCcwCb2SfC/TCz/cJ41sQVk8QsjePtVU8QP523VInz9tE44MvAMjNrCdt+AAwFcPebgJOAc81sM7AROM3TNpe3bNX+0DBNo0yixJzG40sCnbdU0XoKIiI1IAl1CpJUGjO+rfsvgEW/At8SrHo2dlLpq56JpJSSQq3RmPFt3X8BNN+yddu3bN1WYpAapLmPao3GjG9r0a+itYtknJJCrdGY8W35lmjtIhmnpFBrNGZ8W1YXrV0k45QUao3GjG9r7KRo7SIZp6RQazR3/7Ym/BSazt56ZWB1wbYeMkuNUp2CiEgNUJ1CBc1b0sb0B19k1dqNDOrfmynjh3HC6MHVDqt8sl7XkPXjSwKd49RQUijRvCVtTJ27jI2bgtEqbWs3MnXuMoBsJIas1zVk/fiSQOc4VfRMoUTTH3wxlxDabdy0hekPvliliMos63UNWT++JNA5ThUlhRKtWrsxUnvqZL2uIevHlwQ6x6mipFCiQf17R2pPnazXNWT9+JJA5zhVlBRKNGX8MHrXb1vo1Lu+jinjh1UpojLLel1D1o8vCXSOU0UPmkvU/jA5s6OPsj4XftaPLwl0jlNFdQoiIjWg2DoF3T4SybKls+Ha4XBZ/+D70tnp6FuqRrePRLIqzvoA1R5klq4URLIqzvoA1R5klpKCSFbFWR+g2oPMUlIQyao46wNUe5BZSgoiWRVnfYBqDzJLSUEkq+JcO0PrcmSW6hRERGqA6hRERCQyJQUREclRUhARkRwlBRERyVFSEBGRHCUFERHJUVIQEZEcJQUREcmJLSmY2RAze8zMlpvZc2b2rQL7mJldb2YrzGypmY2JKx4pgebNF6kZca6nsBn4rrsvNrO+wCIz+4O7P99hn6OBT4Vf+wM3ht8lKTRvvkhNie1Kwd1fd/fF4c/vAMuB/IWLjwd+7YGngP5mtmtcMUkPaN58kZpSkWcKZtYIjAYW5r00GFjZYbuVDycOzGyymTWbWfPq1avjClMK0bz5IjUl9qRgZjsB/wl8293fzn+5wFs+NEOfu89w9yZ3bxo4cGAcYUpnNG++SE2JNSmYWT1BQpjp7nML7NIKDOmw3QCsijMmiUjz5ovUlDhHHxlwC7Dc3X/ayW73Al8JRyEdAKxz99fjikl6QPPmi9SUOEcfjQO+DCwzs5aw7QfAUAB3vwn4PXAMsALYAHwtxnikp0aeoiQgUiNiSwru/icKPzPouI8D58UVg4iIRKOKZhERyVFSEBGRHCUFERHJUVIQEZEcJQUREclRUhARkRwlBRERybGgVCA9zGw18Gq14+jELsA/qh1EjHR86ZXlYwMdXzE+6e7dTh6XuqSQZGbW7O5N1Y4jLjq+9MrysYGOr5x0+0hERHKUFEREJEdJobxmVDuAmOn40ivLxwY6vrLRMwUREcnRlYKIiOQoKfSAmdWZ2RIzu7/Aa5PMbLWZtYRf/7saMZbCzF4xs2Vh/M0FXjczu97MVpjZUjMbU404e6KIY/ucma3r8Pmlaok5M+tvZnPM7AUzW25mB+a9ntrPDoo6vtR+fmY2rEPcLWb2tpl9O2+f2D+/OBfZybJvAcuBnTt5/W53/0YF44nDYe7e2bjoo4FPhV/7AzeG39Oiq2MDeMLdJ1QsmvL6d2C+u59kZh8B+uS9nvbPrrvjg5R+fu7+IjAKgn94Am3Ab/N2i/3z05VCRGbWABwL/KLasVTR8cCvPfAU0N/Mdq12ULXOzHYGDiFYBhd3/x93X5u3W2o/uyKPLyuOAF529/xC3dg/PyWF6K4Dvgd80MU+J4aXdnPMbEiF4ionBx4ys0VmNrnA64OBlR22W8O2NOju2AAONLNnzOwBM9unksGVaHdgNfDL8PbmL8xsx7x90vzZFXN8kN7Pr6PTgFkF2mP//JQUIjCzCcAb7r6oi93uAxrdfSTwMHBbRYIrr3HuPobgUvU8Mzsk7/VCy6ymZRhbd8e2mGA6gM8APwPmVTrAEmwPjAFudPfRwLvARXn7pPmzK+b40vz5ARDeFjsO+E2hlwu0lfXzU1KIZhxwnJm9AtwFHG5md3Tcwd3XuPv74ebNwNjKhlg6d18Vfn+D4J7mfnm7tAIdr4AagFWABkzgAAAEB0lEQVSVia403R2bu7/t7uvDn38P1JvZLhUPtGdagVZ3XxhuzyH4I5q/Tyo/O4o4vpR/fu2OBha7+38XeC32z09JIQJ3n+ruDe7eSHB596i7n9lxn7z7e8cRPJBODTPb0cz6tv8M/DPwbN5u9wJfCUdCHACsc/fXKxxqZMUcm5l9wsws/Hk/gv9H1lQ61p5w9/8HrDSzYWHTEcDzebul8rOD4o4vzZ9fB6dT+NYRVODz0+ijMjCzaUCzu98LnG9mxwGbgTeBSdWMrQf+Cfht+P/V9sCd7j7fzM4BcPebgN8DxwArgA3A16oUa1TFHNtJwLlmthnYCJzm6arw/CYwM7wF8V/A1zLy2bXr7vhS/fmZWR/g88D/6dBW0c9PFc0iIpKj20ciIpKjpCAiIjlKCiIikqOkICIiOUoKIiKSo6QgElE4E2ehGXILtpfh951gZnt32F5gZpldj1iqS0lBJPlOAPbudi+RMlBSkMwJK5d/F06K9qyZnRq2jzWzP4aT4T3YXn0e/sv7OjN7Mtx/v7B9v7BtSfh9WFe/t0AMt5rZX8P3Hx+2TzKzuWY238xeMrOfdHjP2Wb2tzCem83sBjP7LEFl/HQL5tjfI9z9ZDN7Otz/4DKdOhFVNEsmHQWscvdjAcysn5nVE0yQdry7rw4TxRXAWeF7dnT3z4YT5N0KDAdeAA5x981mdiTwY+DEImO4mGAalLPMrD/wtJk9HL42ChgNvA+8aGY/A7YA/0Iwl887wKPAM+7+pJndC9zv7nPC4wHY3t33M7NjgEuBI3tyokTyKSlIFi0DrjGzqwn+mD5hZsMJ/tD/IfyjWgd0nDNmFoC7P25mO4d/yPsCt5nZpwhmoqyPEMM/E0yeeGG43QsYGv78iLuvAzCz54FPArsAf3T3N8P23wCf7qL/ueH3RUBjhLhEuqSkIJnj7n8zs7EEc8RcaWYPEcyI+py7H9jZ2wps/wh4zN2/aGaNwIIIYRhwYria1tZGs/0JrhDabSH4/7DQlMhdae+j/f0iZaFnCpI5ZjYI2ODudwDXENySeREYaOGavmZWb9suwNL+3OEggpkn1wH9CJZEhOgTGz4IfLPDjJ2ju9n/aeBQM/uomW3Ptrep3iG4ahGJnf6FIVk0guDB7AfAJuBcd/8fMzsJuN7M+hH8t38d8Fz4nrfM7EmCdbfbnzP8hOD20QUE9/ij+FHY/9IwMbwCdLpusLu3mdmPgYUE8+M/D6wLX74LuNnMzieYBVQkNpolVWqemS0ALnT35irHsZO7rw+vFH4L3Oru+Qu3i8RKt49EkuMyM2shWPjn76RwKUlJP10piIhIjq4UREQkR0lBRERylBRERCRHSUFERHKUFEREJEdJQUREcv4/C4hKF413pBoAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x2c55bd6b3c8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.scatter(df[:50]['sepal length'], df[:50]['sepal width'], label='0')\n",
    "plt.scatter(df[50:100]['sepal length'], df[50:100]['sepal width'], label='1')\n",
    "plt.plot(test_point[0], test_point[1], 'bo', label='test_point')\n",
    "plt.xlabel('sepal length')\n",
    "plt.ylabel('sepal width')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn实例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
       "           metric_params=None, n_jobs=None, n_neighbors=5, p=2,\n",
       "           weights='uniform')"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf_sk = KNeighborsClassifier()\n",
    "clf_sk.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.95"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf_sk.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### sklearn.neighbors.KNeighborsClassifier\n",
    "\n",
    "- n_neighbors: 临近点个数\n",
    "- p: 距离度量\n",
    "- algorithm: 近邻算法，可选{'auto', 'ball_tree', 'kd_tree', 'brute'}\n",
    "- weights: 确定近邻的权重"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### kd树"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**kd**树是一种对k维空间中的实例点进行存储以便对其进行快速检索的树形数据结构。\n",
    "\n",
    "**kd**树是二叉树，表示对$k$维空间的一个划分（partition）。构造**kd**树相当于不断地用垂直于坐标轴的超平面将$k$维空间切分，构成一系列的k维超矩形区域。kd树的每个结点对应于一个$k$维超矩形区域。\n",
    "\n",
    "构造**kd**树的方法如下：\n",
    "\n",
    "构造根结点，使根结点对应于$k$维空间中包含所有实例点的超矩形区域；通过下面的递归方法，不断地对$k$维空间进行切分，生成子结点。在超矩形区域（结点）上选择一个坐标轴和在此坐标轴上的一个切分点，确定一个超平面，这个超平面通过选定的切分点并垂直于选定的坐标轴，将当前超矩形区域切分为左右两个子区域\n",
    "（子结点）；这时，实例被分到两个子区域。这个过程直到子区域内没有实例时终止（终止时的结点为叶结点）。在此过程中，将实例保存在相应的结点上。\n",
    "\n",
    "通常，依次选择坐标轴对空间切分，选择训练实例点在选定坐标轴上的中位数\n",
    "（median）为切分点，这样得到的**kd**树是平衡的。注意，平衡的**kd**树搜索时的效率未必是最优的。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 构造平衡kd树算法\n",
    "输入：$k$维空间数据集$T＝\\{x_1，x_2,…,x_N\\}$，\n",
    "\n",
    "其中$x_{i}=\\left(x_{i}^{(1)}, x_{i}^{(2)}, \\cdots, x_{i}^{(k)}\\right)^{\\mathrm{T}}$ ，$i＝1,2,…,N$；\n",
    "\n",
    "输出：**kd**树。\n",
    "\n",
    "（1）开始：构造根结点，根结点对应于包含$T$的$k$维空间的超矩形区域。\n",
    "\n",
    "选择$x^{(1)}$为坐标轴，以T中所有实例的$x^{(1)}$坐标的中位数为切分点，将根结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n",
    "\n",
    "由根结点生成深度为1的左、右子结点：左子结点对应坐标$x^{(1)}$小于切分点的子区域， 右子结点对应于坐标$x^{(1)}$大于切分点的子区域。\n",
    "\n",
    "将落在切分超平面上的实例点保存在根结点。\n",
    "\n",
    "（2）重复：对深度为$j$的结点，选择$x^{(1)}$为切分的坐标轴，$l＝j(modk)+1$，以该结点的区域中所有实例的$x^{(1)}$坐标的中位数为切分点，将该结点对应的超矩形区域切分为两个子区域。切分由通过切分点并与坐标轴$x^{(1)}$垂直的超平面实现。\n",
    "\n",
    "由该结点生成深度为$j+1$的左、右子结点：左子结点对应坐标$x^{(1)}$小于切分点的子区域，右子结点对应坐标$x^{(1)}$大于切分点的子区域。\n",
    "\n",
    "将落在切分超平面上的实例点保存在该结点。\n",
    "\n",
    "（3）直到两个子区域没有实例存在时停止。从而形成**kd**树的区域划分。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# kd-tree每个结点中主要包含的数据结构如下\n",
    "class KdNode(object):\n",
    "    def __init__(self, dom_elt, split, left, right):\n",
    "        self.dom_elt = dom_elt  # k维向量节点(k维空间中的一个样本点)\n",
    "        self.split = split  # 整数（进行分割维度的序号）\n",
    "        self.left = left  # 该结点分割超平面左子空间构成的kd-tree\n",
    "        self.right = right  # 该结点分割超平面右子空间构成的kd-tree\n",
    "\n",
    "\n",
    "class KdTree(object):\n",
    "    def __init__(self, data):\n",
    "        k = len(data[0])  # 数据维度\n",
    "\n",
    "        def CreateNode(split, data_set):  # 按第split维划分数据集exset创建KdNode\n",
    "            if not data_set:  # 数据集为空\n",
    "                return None\n",
    "            # key参数的值为一个函数，此函数只有一个参数且返回一个值用来进行比较\n",
    "            # operator模块提供的itemgetter函数用于获取对象的哪些维的数据，参数为需要获取的数据在对象中的序号\n",
    "            #data_set.sort(key=itemgetter(split)) # 按要进行分割的那一维数据排序\n",
    "            data_set.sort(key=lambda x: x[split])\n",
    "            split_pos = len(data_set) // 2  # //为Python中的整数除法\n",
    "            median = data_set[split_pos]  # 中位数分割点\n",
    "            split_next = (split + 1) % k  # cycle coordinates\n",
    "\n",
    "            # 递归的创建kd树\n",
    "            return KdNode(\n",
    "                median,\n",
    "                split,\n",
    "                CreateNode(split_next, data_set[:split_pos]),  # 创建左子树\n",
    "                CreateNode(split_next, data_set[split_pos + 1:]))  # 创建右子树\n",
    "\n",
    "        self.root = CreateNode(0, data)  # 从第0维分量开始构建kd树,返回根节点\n",
    "\n",
    "\n",
    "# KDTree的前序遍历\n",
    "def preorder(root):\n",
    "    print(root.dom_elt)\n",
    "    if root.left:  # 节点不为空\n",
    "        preorder(root.left)\n",
    "    if root.right:\n",
    "        preorder(root.right)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对构建好的kd树进行搜索，寻找与目标点最近的样本点：\n",
    "from math import sqrt\n",
    "from collections import namedtuple\n",
    "\n",
    "# 定义一个namedtuple,分别存放最近坐标点、最近距离和访问过的节点数\n",
    "result = namedtuple(\"Result_tuple\",\n",
    "                    \"nearest_point  nearest_dist  nodes_visited\")\n",
    "\n",
    "\n",
    "def find_nearest(tree, point):\n",
    "    k = len(point)  # 数据维度\n",
    "\n",
    "    def travel(kd_node, target, max_dist):\n",
    "        if kd_node is None:\n",
    "            return result([0] * k, float(\"inf\"),\n",
    "                          0)  # python中用float(\"inf\")和float(\"-inf\")表示正负无穷\n",
    "\n",
    "        nodes_visited = 1\n",
    "\n",
    "        s = kd_node.split  # 进行分割的维度\n",
    "        pivot = kd_node.dom_elt  # 进行分割的“轴”\n",
    "\n",
    "        if target[s] <= pivot[s]:  # 如果目标点第s维小于分割轴的对应值(目标离左子树更近)\n",
    "            nearer_node = kd_node.left  # 下一个访问节点为左子树根节点\n",
    "            further_node = kd_node.right  # 同时记录下右子树\n",
    "        else:  # 目标离右子树更近\n",
    "            nearer_node = kd_node.right  # 下一个访问节点为右子树根节点\n",
    "            further_node = kd_node.left\n",
    "\n",
    "        temp1 = travel(nearer_node, target, max_dist)  # 进行遍历找到包含目标点的区域\n",
    "\n",
    "        nearest = temp1.nearest_point  # 以此叶结点作为“当前最近点”\n",
    "        dist = temp1.nearest_dist  # 更新最近距离\n",
    "\n",
    "        nodes_visited += temp1.nodes_visited\n",
    "\n",
    "        if dist < max_dist:\n",
    "            max_dist = dist  # 最近点将在以目标点为球心，max_dist为半径的超球体内\n",
    "\n",
    "        temp_dist = abs(pivot[s] - target[s])  # 第s维上目标点与分割超平面的距离\n",
    "        if max_dist < temp_dist:  # 判断超球体是否与超平面相交\n",
    "            return result(nearest, dist, nodes_visited)  # 不相交则可以直接返回，不用继续判断\n",
    "\n",
    "        #----------------------------------------------------------------------\n",
    "        # 计算目标点与分割点的欧氏距离\n",
    "        temp_dist = sqrt(sum((p1 - p2)**2 for p1, p2 in zip(pivot, target)))\n",
    "\n",
    "        if temp_dist < dist:  # 如果“更近”\n",
    "            nearest = pivot  # 更新最近点\n",
    "            dist = temp_dist  # 更新最近距离\n",
    "            max_dist = dist  # 更新超球体半径\n",
    "\n",
    "        # 检查另一个子结点对应的区域是否有更近的点\n",
    "        temp2 = travel(further_node, target, max_dist)\n",
    "\n",
    "        nodes_visited += temp2.nodes_visited\n",
    "        if temp2.nearest_dist < dist:  # 如果另一个子结点内存在更近距离\n",
    "            nearest = temp2.nearest_point  # 更新最近点\n",
    "            dist = temp2.nearest_dist  # 更新最近距离\n",
    "\n",
    "        return result(nearest, dist, nodes_visited)\n",
    "\n",
    "    return travel(tree.root, point, float(\"inf\"))  # 从根节点开始递归"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 例3.2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[7, 2]\n",
      "[5, 4]\n",
      "[2, 3]\n",
      "[4, 7]\n",
      "[9, 6]\n",
      "[8, 1]\n"
     ]
    }
   ],
   "source": [
    "data = [[2,3],[5,4],[9,6],[4,7],[8,1],[7,2]]\n",
    "kd = KdTree(data)\n",
    "preorder(kd.root)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import clock\n",
    "from random import random\n",
    "\n",
    "# 产生一个k维随机向量，每维分量值在0~1之间\n",
    "def random_point(k):\n",
    "    return [random() for _ in range(k)]\n",
    " \n",
    "# 产生n个k维随机向量 \n",
    "def random_points(k, n):\n",
    "    return [random_point(k) for _ in range(n)]     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Result_tuple(nearest_point=[2, 3], nearest_dist=1.8027756377319946, nodes_visited=4)\n"
     ]
    }
   ],
   "source": [
    "ret = find_nearest(kd, [3,4.5])\n",
    "print (ret)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "time:  5.4623788 s\n",
      "Result_tuple(nearest_point=[0.09929288205798159, 0.4954936771850429, 0.8005722800665575], nearest_dist=0.004597223680778027, nodes_visited=42)\n"
     ]
    }
   ],
   "source": [
    "N = 400000\n",
    "t0 = clock()\n",
    "kd2 = KdTree(random_points(3, N))            # 构建包含四十万个3维空间样本点的kd树\n",
    "ret2 = find_nearest(kd2, [0.1,0.5,0.8])      # 四十万个样本点中寻找离目标最近的点\n",
    "t1 = clock()\n",
    "print (\"time: \",t1-t0, \"s\")\n",
    "print (ret2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "参考代码：https://github.com/wzyonggege/statistical-learning-method\n",
    "\n",
    "中文注释制作：机器学习初学者\n",
    "\n",
    "微信公众号：ID:ai-start-com\n",
    "\n",
    "配置环境：python 3.5+\n",
    "\n",
    "代码全部测试通过。\n",
    "![gongzhong](../gongzhong.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
